Robust Planning with (L)RTDP
نویسندگان
چکیده
Stochastic Shortest Path problems (SSPs), a subclass of Markov Decision Problems (MDPs), can be efficiently dealt with using Real-Time Dynamic Programming (RTDP). Yet, MDP models are often uncertain (obtained through statistics or guessing). The usual approach is robust planning: searching for the best policy under the worst model. This paper shows how RTDP can be made robust in the common case where transition probabilities are known to lie in a given interval.
منابع مشابه
Planning with Robust (L)RTDP
Stochastic Shortest Path problems (SSPs), a subclass of Markov Decision Problems (MDPs), can be efficiently dealt with using Real-Time Dynamic Programming (RTDP). Yet, MDP models are often uncertain (obtained through statistics or guessing). The usual approach is robust planning: searching for the best policy under the worst model. This paper shows how RTDP can be made robust in the common case...
متن کاملPAC Reinforcement Learning Bounds for RTDP and Rand-RTDP
Real-time Dynamic Programming (RTDP) is a popular algorithm for planning in a Markov Decision Process (MDP). It can also be viewed as a learning algorithm, where the agent improves the value function and policy while acting in an MDP. It has been empirically observed that an RTDP agent generally performs well when viewed this way, but past theoretical results have been limited to asymptotic con...
متن کاملPAC Reinforcement Learning Bounds for RTDP and Rand-RTDP Technical Report
Real-time Dynamic Programming (RTDP) is a popular algorithm for planning in a Markov Decision Process (MDP). It can also be viewed as a learning algorithm, where the agent improves the value function and policy while acting in an MDP. It has been empirically observed that an RTDP agent generally performs well when viewed this way, but past theoretical results have been limited to asymptotic con...
متن کاملLabeled RTDP: Improving the Convergence of Real-Time Dynamic Programming
RTDP is a recent heuristic-search DP algorithm for solving non-deterministic planning problems with full observability. In relation to other dynamic programming methods, RTDP has two benefits: first, it does not have to evaluate the entire state space in order to deliver an optimal policy, and second, it can often deliver good policies pretty fast. On the other hand, RTDP final convergence is s...
متن کاملA Bounded Q-decomposition RDTP Approach to Resource Allocation
This paper contributes to solve effectively stochastic resource allocation problems known to be NP-Complete. To address this complex resource management problem, two approaches are adapted and merged in an effective way: the Q-decomposition and the bounded Real-time Dynamic Programming (bounded rtdp). The Q-decomposition allows to coordinate reward separated agents and thus permits to reduce th...
متن کامل